Projekt polega na przeanalizowaniu bazy danych o klockach LEGO ze strony Rebrickable. Raport składa się z następujących sekcji:
Analiza ujawnia, że zainteresowanie klockami LEGO nieustannie rośnie od momentu ich wypuszczenia. Pomimo kilku lat, w których wzrost produkcji nowych zestawów spowolnił, ogólny trend jest rosnący. Duża popularność klocków LEGO wynika między innymi z ilości współprac ze znanymi markami jak Star Wars oraz Harry Potter. Z przeprowadzonych prognoz wynika, że liczba produkowanych klockow LEGO w przyszłości będzie ciągle rosła.
Do zrealizowania projektu wykorzystane zostały następujące biblioteki:
W celu zapewnienia powtarzalności wyników, ziarno zostało ustawione na 23.
set.seed(23)
colors_raw <- read.csv("colors.csv", header = TRUE, sep = ",")
elements_raw <- read.csv("elements.csv", header = TRUE, sep = ",")
inventories_raw <- read.csv("inventories.csv", header = TRUE, sep = ",")
inventory_minifigs_raw <- read.csv("inventory_minifigs.csv", header = TRUE, sep = ",")
inventory_parts_raw <- read.csv("inventory_parts.csv", header = TRUE, sep = ",")
inventory_sets_raw <- read.csv("inventory_sets.csv", header = TRUE, sep = ",")
minifigs_raw <- read.csv("minifigs.csv", header = TRUE, sep = ",")
part_categories_raw <- read.csv("part_categories.csv", header = TRUE, sep = ",")
part_relationships_raw <- read.csv("part_relationships.csv", header = TRUE, sep = ",")
parts_raw <- read.csv("parts.csv", header = TRUE, sep = ",")
sets_raw <- read.csv("sets.csv", header = TRUE, sep = ",")
themes_raw <- read.csv("themes.csv", header = TRUE, sep = ",")
Baza danych jest kompletna, tzn w żadnej tabeli nie ma wartości brakujących. Wyjątkiem jest tabela Themes, która zawiera kolumnę parent_id. Dla części rekordów wartość tej kolumny pozostaje pusta, ponieważ tematy te są tematami nadrzędnymi (np. Technic, Town, City). Dodatkowo z tabeli Elements usunięta została kolumna design_id, ponieważ nie wiadomo co ona oznacza, oraz nie występuje do niej nawiązanie w żadnej innej tabeli.
colors_clean <- colors_raw
elements_clean <- elements_raw[,-4]
inventories_clean <- inventories_raw
inventory_minifigs_clean <- inventory_minifigs_raw
inventory_parts_clean <- inventory_parts_raw
inventory_sets_clean <- inventory_sets_raw
minifigs_clean <- minifigs_raw
part_categories_clean <- part_categories_raw
part_relationships_clean <- part_relationships_raw
parts_clean <- parts_raw
sets_clean <- sets_raw
themes_clean <- themes_raw
| id | name | rgb | is_trans |
|---|---|---|---|
| -1 | [Unknown] | 0033B2 | f |
| 0 | Black | 05131D | f |
| 1 | Blue | 0055BF | f |
| 2 | Green | 237841 | f |
| 3 | Dark Turquoise | 008F9B | f |
| 4 | Red | C91A09 | f |
| Name | colors_clean |
| Number of rows | 263 |
| Number of columns | 4 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| name | 0 | 1 | 3 | 28 | 0 | 263 | 0 |
| rgb | 0 | 1 | 6 | 6 | 0 | 223 | 0 |
| is_trans | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1 | 651.38 | 750.55 | -1 | 83 | 1005 | 1070.5 | 9999 | ▇▁▁▁▁ |
| element_id | part_num | color_id |
|---|---|---|
| 6443403 | 2277c01pr0009 | 1 |
| 6300211 | 67906c01 | 14 |
| 4566309 | 2564 | 0 |
| 4275423 | 53657 | 1004 |
| 6194308 | 92926 | 71 |
| 6229123 | 26561 | 4 |
| Name | elements_clean |
| Number of rows | 84138 |
| Number of columns | 3 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| part_num | 0 | 1 | 2 | 19 | 0 | 33765 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| element_id | 0 | 1 | 5222065.12 | 1596842.63 | 9327 | 4259774 | 6057754 | 6262025 | 61532443 | ▇▁▁▁▁ |
| color_id | 0 | 1 | 539.67 | 2044.86 | -1 | 8 | 28 | 135 | 9999 | ▇▁▁▁▁ |
| id | version | set_num |
|---|---|---|
| 1 | 1 | 7922-1 |
| 3 | 1 | 3931-1 |
| 4 | 1 | 6942-1 |
| 15 | 1 | 5158-1 |
| 16 | 1 | 903-1 |
| 17 | 1 | 850950-1 |
| Name | inventories_clean |
| Number of rows | 37265 |
| Number of columns | 3 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| set_num | 0 | 1 | 3 | 20 | 0 | 35644 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1 | 61103.60 | 51380.10 | 1 | 14424 | 54379 | 88842 | 194312 | ▇▆▂▂▂ |
| version | 0 | 1 | 1.09 | 0.58 | 1 | 1 | 1 | 1 | 16 | ▇▁▁▁▁ |
| inventory_id | fig_num | quantity |
|---|---|---|
| 3 | fig-001549 | 1 |
| 4 | fig-000764 | 1 |
| 19 | fig-000555 | 1 |
| 25 | fig-000574 | 1 |
| 26 | fig-000842 | 1 |
| 26 | fig-008641 | 1 |
| Name | inventory_minifigs_clean |
| Number of rows | 20858 |
| Number of columns | 3 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| fig_num | 0 | 1 | 10 | 10 | 0 | 13455 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| inventory_id | 0 | 1 | 43010.44 | 52256.78 | 3 | 7869 | 15681 | 66834 | 194312 | ▇▁▁▁▁ |
| quantity | 0 | 1 | 1.06 | 0.78 | 1 | 1 | 1 | 1 | 100 | ▇▁▁▁▁ |
| inventory_id | part_num | color_id | quantity | is_spare | img_url |
|---|---|---|---|---|---|
| 1 | 48379c01 | 72 | 1 | f | https://cdn.rebrickable.com/media/parts/photos/1/48379c01-1-e7daa845-2671-4737-8642-3b1574308155.jpg |
| 1 | 48395 | 7 | 1 | f | https://cdn.rebrickable.com/media/parts/photos/7/48395-7-b9152acf-2fa5-4836-a04d-5b7fd39c2406.jpg |
| 1 | stickerupn0077 | 9999 | 1 | f | |
| 1 | upn0342 | 0 | 1 | f | |
| 1 | upn0350 | 25 | 1 | f | |
| 3 | 2343 | 47 | 1 | f | https://cdn.rebrickable.com/media/parts/elements/3000240.jpg |
| Name | inventory_parts_clean |
| Number of rows | 1180987 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 3 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| part_num | 0 | 1 | 1 | 20 | 0 | 51051 | 0 |
| is_spare | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
| img_url | 0 | 1 | 0 | 117 | 8180 | 74266 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| inventory_id | 0 | 1 | 50849.46 | 55136.94 | 1 | 9404 | 22838 | 87088 | 194312 | ▇▂▁▂▁ |
| color_id | 0 | 1 | 131.78 | 862.38 | -1 | 4 | 15 | 71 | 9999 | ▇▁▁▁▁ |
| quantity | 0 | 1 | 3.37 | 9.95 | 1 | 1 | 2 | 4 | 3064 | ▇▁▁▁▁ |
| inventory_id | set_num | quantity |
|---|---|---|
| 35 | 75911-1 | 1 |
| 35 | 75912-1 | 1 |
| 39 | 75048-1 | 1 |
| 39 | 75053-1 | 1 |
| 50 | 4515-1 | 1 |
| 50 | 4520-1 | 2 |
| Name | inventory_sets_clean |
| Number of rows | 4358 |
| Number of columns | 3 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| set_num | 0 | 1 | 5 | 20 | 0 | 3171 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| inventory_id | 0 | 1 | 52518.95 | 59063.13 | 35 | 8076 | 16423 | 98685 | 191576 | ▇▁▁▂▁ |
| quantity | 0 | 1 | 1.81 | 5.67 | 1 | 1 | 1 | 1 | 60 | ▇▁▁▁▁ |
| fig_num | name | num_parts | img_url |
|---|---|---|---|
| fig-000001 | Toy Store Employee | 4 | https://cdn.rebrickable.com/media/sets/fig-000001.jpg |
| fig-000002 | Customer Kid | 4 | https://cdn.rebrickable.com/media/sets/fig-000002.jpg |
| fig-000003 | Assassin Droid, White | 8 | https://cdn.rebrickable.com/media/sets/fig-000003.jpg |
| fig-000004 | Man, White Torso, Black Legs, Brown Hair | 4 | https://cdn.rebrickable.com/media/sets/fig-000004.jpg |
| fig-000005 | Captain America with Short Legs | 3 | https://cdn.rebrickable.com/media/sets/fig-000005.jpg |
| fig-000006 | Lloyd Avatar | 5 | https://cdn.rebrickable.com/media/sets/fig-000006.jpg |
| Name | minifigs_clean |
| Number of rows | 13764 |
| Number of columns | 4 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| fig_num | 0 | 1 | 10 | 10 | 0 | 13764 | 0 |
| name | 0 | 1 | 1 | 148 | 0 | 13354 | 0 |
| img_url | 0 | 1 | 53 | 53 | 0 | 13764 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| num_parts | 0 | 1 | 5.3 | 6.03 | 0 | 4 | 4 | 5 | 156 | ▇▁▁▁▁ |
| id | name |
|---|---|
| 1 | Baseplates |
| 3 | Bricks Sloped |
| 4 | Duplo, Quatro and Primo |
| 5 | Bricks Special |
| 6 | Bricks Wedged |
| 7 | Containers |
| Name | part_categories_clean |
| Number of rows | 66 |
| Number of columns | 2 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| name | 0 | 1 | 4 | 44 | 0 | 66 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1 | 35.36 | 19.41 | 1 | 19.25 | 35.5 | 51.75 | 68 | ▇▇▇▇▇ |
| rel_type | child_part_num | parent_part_num |
|---|---|---|
| P | 3626cpr3662 | 3626c |
| P | 87079pr9974 | 87079 |
| P | 3960pr9971 | 3960 |
| R | 98653pr0003 | 98086pr0003 |
| R | 98653pr0003 | 98088pat0003 |
| R | 98653pr0003 | 98089pat0003 |
| Name | part_relationships_clean |
| Number of rows | 29977 |
| Number of columns | 3 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| rel_type | 0 | 1 | 1 | 1 | 0 | 6 | 0 |
| child_part_num | 0 | 1 | 1 | 20 | 0 | 27139 | 0 |
| parent_part_num | 0 | 1 | 1 | 19 | 0 | 4725 | 0 |
| part_num | name | part_cat_id | part_material |
|---|---|---|---|
| 003381 | Sticker Sheet for Set 663-1 | 58 | Plastic |
| 003383 | Sticker Sheet for Sets 618-1, 628-2 | 58 | Plastic |
| 003402 | Sticker Sheet for Sets 310-3, 311-1, 312-3 | 58 | Plastic |
| 003429 | Sticker Sheet for Set 1550-1 | 58 | Plastic |
| 003432 | Sticker Sheet for Sets 357-1, 355-1, 940-1 | 58 | Plastic |
| 003434 | Sticker Sheet for Set 575-2, 653-1, 460-1 | 58 | Plastic |
| Name | parts_clean |
| Number of rows | 52615 |
| Number of columns | 4 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| part_num | 0 | 1 | 1 | 20 | 0 | 52615 | 0 |
| name | 0 | 1 | 3 | 222 | 0 | 52103 | 0 |
| part_material | 0 | 1 | 4 | 16 | 0 | 7 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| part_cat_id | 0 | 1 | 38.91 | 22.08 | 1 | 17 | 41 | 60 | 68 | ▃▃▂▁▇ |
| set_num | name | year | theme_id | num_parts | img_url |
|---|---|---|---|---|---|
| 001-1 | Gears | 1965 | 1 | 43 | https://cdn.rebrickable.com/media/sets/001-1.jpg |
| 0011-2 | Town Mini-Figures | 1979 | 67 | 12 | https://cdn.rebrickable.com/media/sets/0011-2.jpg |
| 0011-3 | Castle 2 for 1 Bonus Offer | 1987 | 199 | 0 | https://cdn.rebrickable.com/media/sets/0011-3.jpg |
| 0012-1 | Space Mini-Figures | 1979 | 143 | 12 | https://cdn.rebrickable.com/media/sets/0012-1.jpg |
| 0013-1 | Space Mini-Figures | 1979 | 143 | 12 | https://cdn.rebrickable.com/media/sets/0013-1.jpg |
| 0014-1 | Space Mini-Figures | 1979 | 143 | 2 | https://cdn.rebrickable.com/media/sets/0014-1.jpg |
| Name | sets_clean |
| Number of rows | 21880 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 3 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| set_num | 0 | 1 | 3 | 20 | 0 | 21880 | 0 |
| name | 0 | 1 | 2 | 93 | 0 | 18752 | 0 |
| img_url | 0 | 1 | 46 | 63 | 0 | 21880 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| year | 0 | 1 | 2007.76 | 13.96 | 1949 | 2001 | 2012 | 2018 | 2024 | ▁▁▁▃▇ |
| theme_id | 0 | 1 | 441.97 | 215.53 | 1 | 273 | 497 | 608 | 752 | ▃▃▃▇▇ |
| num_parts | 0 | 1 | 161.38 | 418.14 | 0 | 3 | 31 | 139 | 11695 | ▇▁▁▁▁ |
| id | name | parent_id |
|---|---|---|
| 1 | Technic | NA |
| 3 | Competition | 1 |
| 4 | Expert Builder | 1 |
| 16 | RoboRiders | 1 |
| 17 | Speed Slammers | 1 |
| 18 | Star Wars | 1 |
| Name | themes_clean |
| Number of rows | 468 |
| Number of columns | 3 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| name | 0 | 1 | 2 | 42 | 0 | 385 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| id | 0 | 1.00 | 433.46 | 216.55 | 1 | 250.5 | 466 | 625.25 | 752 | ▅▅▅▆▇ |
| parent_id | 145 | 0.69 | 360.64 | 197.19 | 1 | 186.0 | 411 | 512.50 | 697 | ▅▃▂▇▂ |
| Nazwa | Liczba | Zdjęcie |
|---|---|---|
| Series 19 - Random Bag | 110 | |
| Harry Potter Stickers and Cards - Random Pack | 61 | |
| Unikitty! Series 1 - Random Bag | 60 | |
| Series 9 - Random Bag | 60 | |
| Series 10 - Random Bag | 60 | |
| Series 11 - Random Bag | 60 | |
| The LEGO Movie Series 1 - Random Bag | 60 | |
| The Simpsons Series 1 - Random Bag | 60 | |
| Series 12 - Random Bag | 60 | |
| Series 13 - Random Bag | 60 | |
| The Simpsons Series 2 - Random Bag | 60 | |
| Series 14 (Monsters) - Random Bag | 60 | |
| Series 15 - Random Bag | 60 | |
| Disney Series 1 - Random Bag | 60 | |
| Series 16 - Random Bag | 60 | |
| DFB (Deutscher Fussball-bund) - Random Bag | 60 | |
| The LEGO Batman Movie Series 1 - Random Bag | 60 | |
| Series 17 - Random Bag | 60 | |
| The LEGO Ninjago Movie - Random Bag | 60 | |
| The LEGO Batman Movie Series 2 - Random Bag | 60 |
| Nazwa | Liczba | Zdjęcie |
|---|---|---|
| Woman, Blue Torso with White Arms, White Legs | 101 | |
| Battle Droid, One Bent Arm, One Straight Arm | 96 | |
| Skeleton, Standard Face, Ball Joint Arms (3626b Head) | 59 | |
| Classic Spaceman, White with Airtanks (3842a Helmet) | 51 | |
| Classic Spaceman, Red with Airtanks (3842a Helmet) | 50 | |
| Battle Droid, Two Bent Arms | 50 | |
| Pit Crew, Red Torso, Red Legs, Ferrari | 34 | |
| Man, Blue Shirt, Blue Legs, Red Hard Hat | 30 | |
| Martian | 29 | |
| Classic Spaceman, Yellow with Airtanks (3842b Helmet) | 27 | |
| Steve | 27 | |
| Policeman, Black Suit with Pocket and Badge, White Hat (3626a Head) | 26 | |
| Blacktron I (3626a Head) | 26 | |
| Skeleton, Standard Face, Bent Arms, Vertical Hand Clips, 60115 Torso | 26 | |
| Soldier (Imperial Soldier) - Backpack | 26 | |
| Blacktron II - 3626a Head | 23 | |
| Clone Trooper, Phase I Armor, Brown Eyes | 23 | |
| Skeleton, Red Eyes | 22 | |
| Grunt / Roar / Growl / Snort / Bob | 22 | |
| Johnny Thunder (Desert) | 21 |
| Nazwa | Liczba | Zdjęcie |
|---|---|---|
| Plate 1 x 2 | 127955 | |
| Plate Round 1 x 1 with Solid Stud | 119278 | |
| Brick 1 x 2 | 99847 | |
| Plate 1 x 1 | 92323 | |
| Brick 1 x 1 | 75605 | |
| Technic Pin with Friction Ridges Lengthwise and Center Slots | 68443 | |
| Plate 1 x 4 | 57805 | |
| Brick 2 x 2 | 55379 | |
| Slope 30° 1 x 1 x 2/3 (Cheese Slope) | 50685 | |
| Tile Round 1 x 1 | 50410 | |
| Tile 1 x 2 with Groove | 49734 | |
| Plate 2 x 4 | 47320 | |
| Brick 1 x 4 | 46258 | |
| Plate 2 x 2 | 43999 | |
| Brick 2 x 4 | 42421 | |
| Tile 1 x 1 with Groove | 37508 | |
| Plate 2 x 3 | 34972 | |
| Plate 1 x 6 | 33226 | |
| Tile Special 1 x 2 Grille with Bottom Groove | 31480 | |
| Brick Round 1 x 1 Open Stud | 30983 |
| Nazwa | Liczba |
|---|---|
| Gear | 3265 |
| Duplo | 1293 |
| Books | 981 |
| Star Wars | 935 |
| Collectible Minifigures | 858 |
| City | 851 |
| Service Packs | 781 |
| Town | 765 |
| Educational and Dacta | 678 |
| Technic | 558 |
| Friends | 549 |
| Ninjago | 533 |
| Creator | 477 |
| System | 464 |
| Bionicle | 460 |
| Universal Building Set | 412 |
| LEGO Brand Store | 392 |
| Space | 334 |
| Seasonal | 303 |
| Super Heroes Marvel | 293 |
Dla liczby elementów i lat korelacja wynosi:
Zgodnie z modelem:
Zgodnie z modelem:
Zgodnie z modelem: